YJTI at the NTCIR-13 STC Japanese Subtask
نویسنده
چکیده
In this paper, we describe our participation in the NTCIR-13 STC Japanese Subtask, in which we develop systems with the retrieval-based method. To retrieve reply texts for a given comment text, our system generates vector representations of both the comment and candidate replies by a 3-layer LSTM-RNN and evaluate distances between the comment vector and the candidate reply vectors, selecting the top-k nearest reply vectors and returning the corresponding reply texts. We also take Theme and Genre into consideration to decide the final ranking. In preparation of the candidate reply texts, we utilize all the comments and replies in the training set of Yahoo! News comments data. Our two runs are based on two different LSTM-RNN models, one trained over Twitter conversation data and the other mainly trained over Yahoo! Chiebukuro QA data. Each dataset has no less than 60 million pairs of text, and we aim to show how effective these combinations of large-scale datasets and largescale neural models are for developing dialog systems. In addition, we had an assumption that the model of Twitter conversation data would outperform that of Yahoo! Chiebukuro QA data as the task domain seemed to be more similar to conversations in microblog services than social question answering, but the reported results revealed that it was not the case.
منابع مشابه
Response Generation for Grounding in Communication at NTCIR-13 STC Japanese Subtask
The AITOK team participated in NTCIR-13 STC Japanese Subtask. This report describes our approach to generating responses to comment texts of Yahoo! News comments data, and discusses our results of formal-run. Our approach intends to make sure of grounding in communication, thereby integrates three strategies and five rules. The strategies are on the presupposition that there is not enough infor...
متن کاملOverview of the NTCIR-12 Short Text Conversation Task
We describe an overview of the NTCIR-12 Short Text Conversation (STC) task, which is a new pilot task of NTCIR-12. STC consists of two subtasks: a Chinese subtask using post-comment pairs crawled from Weibo, and a Japanese subtask providing the IDs of such pairs from Twitter. Thus, the main difference between the two subtasks lies in the sources and languages of the test collections. For the Ch...
متن کاملYUILA at the NTCIR-12 Short Text Challenge: Combining Twitter Data with Dialogue System Logs
The YUILA team participated in the Japanese subtask of the NTCIR-12 Short Text Challenge task. This report describes our approach to solving the responsiveness problem in STC task by using external dialogue log corpus and discusses the official results.
متن کاملOKSAT at NTCIR-12 Short Text Conversation Task: Priority to Short Comments, Filtering by Characteristic Words and Topic Classification
Our group OKSAT submitted five runs for Chinese and Japanese subtasks of the NTCIR-12 Short Text Conversation task (STC). We searched not only posts but also comments for terms of each query (post). We also gave more priority to short comments than longer ones. Then we filtered retrieved comments by characteristic words including proper nouns. We added attributes to the corpus and also to the q...
متن کاملSG01 at the NTCIR-13 STC-2 Task
We describe how we build the system for NTCIR-13 Short Text Conversation (STC) Chinese subtask. In our system, we use the retrieval-based method and the generationbased method respectively. For the retrieval-based method, we develop several features to match the candidates and then apply a learning to rank algorithm to get properly ranked results. For the generation-based method, we first gener...
متن کامل